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Abstract 

Background: Starch serves as a temporal storage of carbohydrates in plant leaves during day/night cycles. To study 
transcriptional regulatory modules of this dynamic metabolic process, we conducted gene regulation network 
analysis based on small-sample inference of graphical Gaussian model (GGM). 

Results: Time-series significant analysis was applied for Arabidopsis leaf transcriptome data to obtain a set of genes 
that are highly regulated under a diurnal cycle. A total of 1,480 diurnally regulated genes included 21 starch 
metabolic enzymes, 6 clock-associated genes, and 106 transcription factors (TF). A starch-clock-TF gene regulation 
network comprising 1 17 nodes and 266 edges was constructed by GGM from these 133 significant genes that 
are potentially related to the diurnal control of starch metabolism. From this network, we found that |3-amylase 
3 (b-amy3: At4g 17090), which participates in starch degradation in chloroplast, is the most frequently connected 
gene (a hub gene). The robustness of gene-to-gene regulatory network was further analyzed by TF binding site 
prediction and by evaluating global co-expression of TFs and target starch metabolic enzymes. As a result, two TFs, 
indeterminate domain 5 (AtlDDS: At2g02070) and constans-like (COL: At2g21320), were identified as positive 
regulators of starch synthase 4 (SS4: At4g 18240). The inference model of AtlDDS-dependent positive regulation of 
554 gene expression was experimentally supported by decreased 554 mRNA accumulation in AtiddS mutant plants 
during the light period of both short and long day conditions. COL was also shown to positively control 554 mRNA 
accumulation. Furthermore, the knockout of AtlDDS and COL led to deformation of chloroplast and its contained 
starch granules. This deformity also affected the number of starch granules per chloroplast, which increased 
significantly in both knockout mutant lines. 

Conclusions: In this study, we utilized a systematic approach of microarray analysis to discover the transcriptional 
regulatory network of starch metabolism in Arabidopsis leaves. With this inference method, the starch regulatory 
network of Arabidopsis was found to be strongly associated with clock genes and TFs, of which AtlDDS and COL 
were evidenced to control 554 gene expression and starch granule formation in chloroplasts. 
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Background 

Starch is an insoluble glucose polymer stored in seeds 
and storage organs of plants. The starch molecule is 
composed of two types of glucose polymers— amylose 
and amylopectin— and is organized to form a distinct 
structure called a starch granule. It is commonly 
accepted that biosynthesis of starch takes place during 
the day using an excess sugar residue from photosyn- 
thesis as a substrate. At night, starch granules in leaves 
are decomposed to sugars to be transported to seeds or 
storage organs and stored as reserved carbohydrates or 
used as precursors in other metabolic pathways [1-3]. 
Besides starch synthase, various enzymes and proteins 
have been identified to play unexpected roles in starch 
biosynthesis, metabolism and granule formation [4-11]. 

With regard to regulation of starch biosynthesis and me- 
tabolism, post-translational protein modifications have 
major impacts on controlling the enzyme activities [12-14]. 
AUosteric regulation of ADP-glucose pyrophosphorylase 
(AGPase) [15-17] and redox modulation of puUulanase- 
type debranching enzymes [18,19], glucan-water-dikinase 
(GWD) [20] and |3-amylase [21] indicate the significance of 
post-translational regulatory mechanisms. Protein phos- 
phorylation and formation of multi-protein complexes of 
starch synthase (SS), branching enzyme (BE), debranching 
enzyme (DBE), and starch phosphorylase (SP) suggest tight 
linkages of metabolic pathways through modification and 
physical interactions of the enzymes (reviewed in [12]). In 
addition to post-translational mechanisms, genes encoding 
starch metabolic enzymes are also known to be regulated 
under transcriptional control. In barley, a sugar-inducible 
transcription factor (TF) in the WRI<Y family, SUSIBA 2, is 
reported to act as an activator in endosperm starch biosyn- 
thesis [22]. In rice, a complex of a MYC protein (OsBP-5) 
and an EREBP protein (OsEBP-89) is proposed to be a 
transcriptional regulator of the rice Wx gene, whose prod- 
uct, namely the granule-bound starch synthase (GBSS), is 
responsible for synthesis of amylose in mature seeds [23]. 
Additional finding in Arabidopsis indicates that expression 
of the GBSS'I gene is controlled by 2 main clock TPs, cir- 
cadian clock associated 1 (CCAl: At2g46830) and late 
elongated hypocotyl (LHY: Atlg01060) [24]. The roles of 
these TPs suggest the significance of transcriptional 
mechanisms, although gene regulatory networks of starch 
metabolism remain largely uncharacterized. 

Inference methods for construction of gene regulatory 
networks have been extensively developed after genome- 
wide microarray repositories became publicly available 
[25-31]. Reverse-engineering approaches of Arabidopsis 
gene regulatory network reconstruction utilizing large- 
scale microarray experiments have been previously pro- 
posed and revealed the Arabidopsis regulatory network 
models from different viewpoints [28-32]. Carrera and 
colleagues [30] applied a qualitative network model based 



on a probabilistic model and linear regression to 1,436 
Arabidopsis microarrays, and analyzed topological para- 
meters of the network. Their result showed that genes 
having cellular functions involved in responses and adap- 
tation to environmental changes tended to have higher 
connectivity than genes not related to stress responses. 
Another approach of the gene network reconstruction 
from large-scaled Arabidopsis microarrays is proposed by 
Mao et al, [32]. They constructed a genome-wide co-ex- 
pression network of Arabidopsis from 1,094 microarrays 
based on Pearson correlation and analyzed modular 
structures of the networks that are functionally related. 
Significantly enriched pathway terms were then analyzed 
for the predicted modules. One module was defined to 
be enriched in starch metabolism; 9 out of 10 genes con- 
tained in this module were related to starch metabolism. 
Since the genes in the same module were predicted from 
co-expression across various conditions, it is suggested 
that these starch metabolic genes are potentially co-regu- 
lated. The other method successfully utilized to construct 
a regulatory network is gene expression analysis using a 
modified graphical Gaussian model (GGM) [33,34]. The 
modified GGM is considered appropriate for analysis of 
microarray data that usually has a high-dimensionality 
problem (i.e. the number of genes is much higher than 
the number of measurements). This technique has been 
applied to 2,045 Arabidopsis microarrays to construct a 
gene network which can be subdivided to sub-network 
structures [29]. A number of sub-networks identified 
through this approach are suggested to be related to me- 
tabolism and stress responses. One of them is considered 
as a starch catabolism sub-network, where 7 out of 15 
genes present in the network are apparently relevant to 
starch degradation pathways [29] . 

Given these backgrounds, systematic analysis of micro- 
array data appears to provide insights into gene regula- 
tory networks of starch metabolism in Arabidopsis, In 
this work, an inference model was constructed from a di- 
urnal cycle microarray dataset to identify candidate tran- 
scriptional regulators of plant starch metabolism. This 
approach is based on evidence that biosynthesis and deg- 
radation of leaf starch is completed within 24 hours 
[3,35-37], and hypothesizes that regulators of our inter- 
ests are co-expressed and oscillated with genes involved 
in starch metabolic processes under a day-night cycle 
[24,36-38]. Firstly, we identified genes whose expression 
profiles changed over an observed 24-hr time period. 
Among these temporally regulated significant' genes, a 
set of starch metabolic genes, TPs, and clock genes was 
utilized for the construction of gene association networks 
using the small sample inference framework of GGM 
[33,34]. Subsequently, a few pairs of TPs and starch 
metabolic genes were selected based on their correlation 
coefficients from global co-expression profiles. Finally, 
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validation of the relationships between the selected TFs 
and starch metabolic genes were carried out using TF 
loss-of-function mutant lines. The results obtained from 
this study have led us to identify the involvement of TFs, 
indeterminate domainS (AtlDDS: At2g02070) and 
constans-like (COL: At2g21320), in transcriptional regu- 
lation of an Arabidopsis starch metabolic gene. The work 
presented here provides a model for systematic under- 
standing of regulatory networks of starch metabolic path- 
way applicable for modification of starch synthesis and 
accumulation. 

Results and Discussion 

Initial screening of significant genes in a diurnal cycle 

A time-series significant analysis was performed using 
the Extraction of Differential Gene Expression (EDGE) 
software package. This software can identify differentially 
expressed genes from both typical and time-course 
microarray experiments [39,40] . In this research, the soft- 
ware was applied to detect changes in Arabidopsis gene 
expression occurring within a 24 hour period [36]. This 
data set is the time-course measurement of transcripts 
extracted from fully- expanded leaves of Arabidopsis 
grown under a 12-hour-light and 12-hour-dark (12 L/12D) 
condition. The samples are taken after 1, 2, 4, 8 and 12 
hours in darkness or light, starting from the end or 



beginning of the light period, respectively. The concept of 
EDGE is based on the hypothesis testing of differential gene 
expression patterns fitted by natural cubic spline 
interpolation. Genes whose expression patterns deviate 
from a standard line within a 24-hour period would be 
detected as differentially expressed genes under a diurnal 
cycle. From approximately 22,000 Arabidopsis genes on the 
ATH-1 Affymetrix genome arrays, 1,480 genes were 
detected as significant genes (Q<0.01). These significant 
genes were clustered by k-means clustering, then function- 
ally classified in MapMan, a visualization tool for functional 
classification oi Arabidopsis [41]. Genes encoding TFs and 
those related to starch metabolism were used as inputs for 
GGM network construction. 

Co-expression analysis of 1,480 significant genes using 
k-means clustering 

Since the leaf starch content of Arabidopsis increases in 
the light period and decreases in the dark period [36], we 
hypothesized that the high expression levels of genes in 
the starch biosynthetic pathway would be observed dur- 
ing the day, while those in the degradation pathway 
would occur at night. To test this hypothesis, the expres- 
sion profile of all the significant genes were examined by 
k-means clustering (k = 30). All clusters were subjectively 
divided into 4 groups (Figure 1). Group A includes the 
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Figure 1 Thirty clusters of 1,480 significant genes using k-means clustering (k = 30). The number of gene members is shown at the top left 
of each cluster. The clusters where starch genes are present are indicated by red boxes. Four groups of 30 clusters were categorized by their 
expression patterns affected on light. (A) Tightly dark-positively and light-negatively regulated genes: the clusters of genes induced in the dark 
and repressed in the light periods. (B) Tightly dark-negatively and light-positively regulated genes: the clusters of genes repressed in the dark and 
induced in the light periods. (C) Light-positively regulated genes: the cluster of genes whose expressions are constant but the levels are elevated 
at dark to light phase transition. (D) Light-negatively regulated genes: the cluster of genes whose expressions are constant but the levels decline 
at dark to light phase transition. The black and white bar on the top of the figure corresponds to the dark and light period. 
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genes whose mRNA levels are increased in the dark and 
decreased in the light. In contrast, the group B members 
showed their mRNAs increased in the light and 
decreased in the dark. Group C consists of the gene 
members whose expression remains relatively stable ex- 
cept their mRNA levels were observed to increase at the 
dark-to-light transition phase. Group D is similar to 
group C but the expression goes down at the dark-to- 
light transition phase. According to our hypothesis, the 
starch biosynthetic genes are expected to be clustered in 
group B and C since the gene expression increases when 
the light period starts. On the other hand, the genes in 
starch degradation whose expression are expected to in- 
crease in the dark period should be the members of 
group A and D. 

According to Smith et al [36], there are 48 genes 
related to starch metabolism. In the group of 1,480 sig- 
nificant genes identified in this study, 21 out of 48 starch 
metabolic genes were present, 9 of which are related to 
starch synthesis and 12 of which are known to function 
in starch degradation pathway (Table 1). By using k- 
means clustering, these starch metabolic genes were clas- 
sified into 11 different clusters (Figure 1). Most of the 
starch synthesis genes (7 out of 9 genes) are observed in 
the clusters of groups B and C. This result supports the 
hypothesis that starch biosynthetic genes should be up- 
regulated during the day time. The two starch biosyn- 
thetic genes that do not follow this rule are those coding 
for granule-bound starch synthase (GBSS: Atlg32900) 
and starch synthase 2 (SS2: At3g01180). The expressions 
of both genes show distinct diurnal patterns distinguish- 
able from other starch synthase genes [36]. Since their 
products are reported to be embedded in the starch gran- 
ules that are daily destroyed at the night period [36], their 
high expression levels after the onset of the light are con- 
sidered necessary to regenerate GBSS and SS2 proteins 
for starch biosynthesis during the light period. In con- 
trast, only 2 of 12 total genes encoding for enzymes in 
starch degradation pathways, a-amylase 2 (a-AMY2: 
Atlg76130) and |3-amylase 9 (b-AMY9: At5gl8670), 
showed expression patterns correlated with the starch 
content profile. One explanation is that starch degrading 
enzymes might need a lag time in post-transcriptional or 
translational processes to become catalytically functional. 

Functional categories of 1,480 significant genes 

The 1,480 significant genes were categorized according to 
the functions defined in MapMan (Table 2). The largest 
functional group containing 471 genes, which accounted 
for -32% of total significant genes, was classified as "Not 
assigned". Out of the 471 genes in this group, 276 genes 
(-59%) were identified as unknown expressed proteins. 
The second largest group was the "RNA" group. This 
group contains 177 genes whose functions are related to 



Table 1 Twenty-one differentially expressed starch genes 



(Q<0.01) under the diurnal condition 


AGI 


Name 


Description 


Cluster 


Group 


Atlg32900 


GBSS 


Granule-bound starch 
synthase 


19 


A 


Atlg76130 


a-AMY2 


a-amylase, putative 


8 


A 


AtSgOl 180 


SS2 


Starch synthase 2 


8 


A 


At5g 18670 


b-AMY9 


(3-amylase, putative (BMY3) 


19 


A 


AtlgOSSlO 


ISA2/DBE1 


Isoamylase, putative 


2 


B 


Atlg 10760 


GWDl 


Glucan water dikinase 


10 


B 


Atlg69830 


a-AMY3 


a-amylase, putative 


6 


B 


At2g36390 


SBE2-1/BE3 


1,4-a-glucan branching 
enzyme 


22 


B 


At2a40840 


DPE2 


GI\/rn<^iHp h\/Hrnl?i<^p 
family 77 protein 


23 


B 


At3g29320 


StP, Plast 


Glucan phosphorylase, 
putative 


6 


B 


At3g46970 


Stp, Cyt 


Starch phosphorylase, 
putative 


6 


B 


At4g09020 


IS03 


Isoamylase, putative 


6 


B 


At4g 18240 


SS4 


Starch synthase 4 


22 


B 


At5g 11720 


AGLU 
like-4 


a-glucosidasel(AGLUl) 


20 


B 


At5g24300 


SSI 


Starch synthase 1 


2 


B 


At5g26570 


GWD/ 
PWD 


Glycoside hydrolase 
starch-binding 
domain-containing 
protein 


10 


B 


At5g51820 


PGM 


Phosphoglucomutase 


22 


B 


At5g64860 


DPEl 


disproportionating 
enzyme, putative 


10 


B 


At2g32290 


b-AMY6 


(3-amylase, putative 


13 


C 


At2g39930 


ISAl 


Isoamylase, putative 


16 


C 


At4g 17090 


b-AMY3 


P-amylase (CT-BMY) 


24 


C 



Cluster indicates the cluster number analyzed by k-means clustering (k = 30). 
All clusters were subjectively divided Into 4 groups— A, B, C, and 
D — according to the expression pattern In each cluster (Figure 1). 



RNA processing, transcription process, and regulation of 
transcription. As a result, 106 TF genes were assigned in 
the RNA group. Among 24 significant genes in the "Major 
carbohydrate metabolism" group (Table 2), 21 genes are 
starch-related, while the other 3 are related to sucrose 
metabolism. Additionally, our significant gene set also 
includes both clock and clock-regulated genes such as 
circadian clock associated 1 (CCAl: At2g46830), late 
elongated hypocotyl {LHY: Atlg01060), early-flowering 3 
{ELF3: At2g25930), phytochrome B {PHYB: At2gl8790), 
timing of cab expression 1 (TOCl: At5g61380), and ca- 
sein kinase II beta-chain {CKB3: At3g60250). These TF, 
clock, and starch metabolic genes whose mRNA levels 
are significantly modulated during a diurnal cycle were 
used for reconstruction of transcriptional regulatory net- 
work of starch metabolism. 
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Table 2 Functional categories of 1,480 significant genes 
defined by the MapMan tool 



Functional category 


Number 




Group 






of genes 


A 


B 


C 


D 


Photosynthesis 


33 


9 


8 


12 


4 


Major carbohydrate metabolism 


24 


6 


15 


3 


0 


Minor carbohydrate metabolism 


17 


6 


3 


4 


4 


Glycolysis 


10 


0 


9 


1 


0 


Fermentation 


2 


0 


2 


0 


0 


Gluconeogenesis/glyoxylate cycle 


3 


1 


0 


0 


2 


Oxidative pentose phosphate pathway 


4 


0 


3 


0 


1 


TCA / organism Transformation 


11 


3 


4 


3 


1 


Mitochondrial electron 
transport /ATP synthesis 


7 


3 


2 


0 


2 


Cell wall 


19 


6 


6 


2 


5 


Lipid metabolism 


41 


13 


12 


9 


7 


N-metabolism 


2 


1 


0 


0 


1 


Amino acid metabolism 


38 


14 


13 


5 


6 


S-assimilation 


2 


2 


0 


0 


0 


Metal handling 


7 


3 


3 


1 


0 


Secondary metabolism 


32 


21 


5 


5 


1 


Hormone metabolism 


36 


11 


13 


8 


4 


Co-factor and vitamin metabolism 


8 


2 


4 


0 


2 


Tetrapyrrole synthesis 


6 


3 


2 


1 


0 


Stress 


61 


11 


24 


16 


10 


Redox regulation 


26 


9 


11 


3 


3 


Polyamine metabolism 


6 


4 


1 


1 


0 


Nucleotide metabolism 


16 


3 


9 


2 


2 


Biodegradation of xenobiotics 


4 


1 


0 


1 


2 


CI -metabolism 


6 


1 


4 


0 


1 


Miscellaneous 


73 


20 


30 


14 


9 


RNA 


177 


55 


57 


37 


28 


DNA 


29 


5 


17 


5 


2 


Protein 


157 


41 


64 


17 


35 


Signalling 


74 


25 


24 


12 


13 


Cell 


53 


9 


26 


5 


13 


Development 


37 


9 


12 


10 


6 


Transport 


81 


40 


19 


15 


7 


Not assigned 


471 


159 


148 


67 


97 



Groups A, B, C, and D correspond to classifications in k-means clustering 
analysis presented in Figure 1. 



Gene regulation network of starch metabolism in 
Arabidopsis leaves 

To further investigate the transcriptional regulation of 
starch metabolic pathway, only metabolic genes in starch 
metabolism and all possible regulators (i.e. TF and clock 
genes) were focused in this research. The 133 significant 



genes —composed of 21 starch metabolic genes, 106 TF 
genes, and 6 clock genes— were subjected to GGM net- 
work construction [33]. Since GGM generates a condi- 
tional network depending on an input genes set, 
different sets of input genes are influent to the resulted 
network. Preservation of the TF-starch relationships in 
the networks constructed from expanded gene sets indi- 
cated the robustness of the network reconstructed by 
the focused set of starch metabolic and regulator genes 
(discussed in the following section). 

The gene association network of starch metabolic path- 
way was constructed using small sample inference of 
GGM implemented in the R package 'GeneNet' [33]. The 
transcript profiles of 133 significant genes were retrieved 
from the original microarray data [36], and transformed 
to log-base 2 scales before inferring the gene association 
network. In the resulting network, a node represents a 
gene and an interaction between 2 genes is called an 
edge. Each edge indicates a correlated expression of any 
2 genes after removing effects of other genes in a study 
set. The hypothesis testing of non-zero partial correlation 
and false discovery rate (FDR), multiple testing correc- 
tion were employed to obtain significant correlation coef- 
ficients (i.e. significant edges) in the final network. From 
a total of 133 genes, the final network derived from 
GGM analysis at Q<0.05 consisted of 117 nodes (or 
genes), corresponding with 16 starch (7 synthesis and 9 
degradation), 97 TF, and 4 clock genes, and 266 edges 
(or, an association between 2 genes) (Figure 2). These 
edges could be the associations between 1) regulator and 
regulator genes (i.e. TF-TF, clock-TF, and clock-clock), 2) 
regulator and target genes (i.e. TF-starch and clock- 
starch), and 3) target and target genes (i.e. starch-starch). 
There are 215, 42, and 9 edges for the first, second, and 
third edge types, respectively. 

Since 80% of the genes used in the network recon- 
struction are TF genes, predominant interactions in the 
final network represent those between TFs. To verify the 
significance of these TF-TF interactions and the robust- 
ness of the starch genes association network, especially 
the TF-starch relationships shown in Figure 2, another 
gene association network was reconstructed by expand- 
ing a set of metabolic genes from only genes in starch 
metabolism to genes in broader carbon-related metabo- 
lisms. The same set of 106 TFs, 6 clock genes, and 171 
metabolic genes selected from 11 carbon-related func- 
tional groups categorized to be related to photosynthesis, 
major carbohydrate metabolism, minor carbohydrate 
metabolism, glycolysis, fermentation, gluconeogenesis/ 
glyoxylate cycle, oxidative pentose phosphate pathway, 
TCA/organism transformation, mitochondrial electron 
transport/ ATP synthesis, cell wall, and lipid metabolism 
by MapMan (Table 2) were utilized for network recon- 
struction. It should be noted that all 21 starch genes 



Ingkasuwan et al. BMC Systems Biology 2012, 6:100 
httpy/www.biomedcentral.com/l 752-0509/6/1 00 



Page 6 of 21 




Figure 2 The gene association networic of starch metabolism inferred from GGM (Q<0.05). The network contains 1 17 nodes and 266 
edges. A pink rectangle represents a TF gene; an orange diamond represents a clock gene; a green circle represent a starch gene; a black solid 
line represents an association with positive correlation; and a blue broken line represents an association with negative correlation. 

V J 



are present in the major carbohydrate metabolism cat- 
egory, thus included in the metabolic gene set. The 
gene association network of carbon-related metabolisms 
is shown in Additional file 1: Figure SI. In this 
expanded network, 151 out of 215 regulator- regulator 
edges (70%) and 26 out of 42 TF-starch or clock-starch 
relationships (62%) were identical with the starch net- 
work shown in Figure 2. All 6 TF candidates and their 
relationships to 5 starch genes that will be further ex- 
perimentally validated (discussed in the following sec- 
tion) exist in the expanded network. The result 
indicates that most of the regulatory relationships of 
the starch gene association network are robust and pre- 
served even when expanding an input gene set to 
carbon-related metabolic genes. 

According to Figure 3A, the node degree of TF and 
clock genes ranged from 1 to 14 connections. Among 
the starch metabolic genes, |3-amylase 3 {b-AMY3: 
At4gl7090) was detected as the most-connected node 
or a hub gene with 15 neighbors (Figure 3B). This re- 
sult may indicate the importance of b-AMY3 in the 



starch metabolism. The b-AMY3 represents a chloro- 
plastic p-amylase [21,36,42] with a possible role in the 
leaf starch degradation process [3,43-45]. In the b- 
AMY3 sub-network (Figure 3B), there are 2 starch 
metabolic genes, starch synthase 4 {SS4: At4gl8240) 
and p-amylase 6 {b'AMY6: At2g32290), that showed 
positive correlations with b-AMY3, There are reports 
indicating that SS4 involves in the starch granule initi- 
ation process [6,11], and b-AMY6 is repressed under 
heat shock stress [46]. However, no functional correl- 
ation among these 3 starch metabolic genes has ever 
been reported. 

In addition to gene-to-gene association with two starch 
metabolic genes, b-AMY3 was also observed to have 
positive correlation with various TF genes and a clock 
gene, TOCl, TOCl is an evening gene expressed during 
the night period with a major role in circadian rhythm 
[47-49]. From the k-means clustering, expression of b- 
AMY3 increased slightly during the night period, 
decreased at the dark-to-light transition phase, then 
increased again a few hours later. Our findings are not 
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Figure 3 p-amylase 3 is the most frequently connected gene (a hub gene), (a) The node degree distribution of starcli and TF genes in tlie 
gene association networl<. Tine gene witli tine liigliest degree of connection is (3-amylase 3 (b-AMY3: At4g 17090) as indicated by a blacl< arrow, 
(b) Sub-networl< of tine most-connected node (a liub gene), |3-amylase 3 at Q< 0.05. A pinl< rectangle represents a TF gene; an orange diamond 
represents a clocl< gene; a green circle represents a starch gene; a black solid line represents an association with positive correlation; and a blue 
broken line represents an association with negative correlation. 



only in agreement with the results indicating regulation 
of b'AMYS expression under diurnal and circadian 
rhythms [36,50,51], but they also suggest the circadian 
control of b-AMY3 via TOCl. 

From the GGM network construction, we additionally 
found that the sub-networks of two starch metabolic 
genes encoding GBSS (Atlg32900) and disproportionate 
ing enzyme (DPEl: At5g64860) and their TF neigh- 
bours are entirely separated from the rest of the 
network (Figure 2). These isolated modules indicate 
specific correlation between starch metabolic genes and 
their connected TFs. The LIM domain-containing pro- 
tein (At2g39900) associated with DPEl in the network is 
in fact WLIM2a, an actin-bundling protein that functions 



in cytoskeleton organization [52]; its gene expression is 
regulated by pickle (PKL), a member of CHD3 
chromatin- remodelling protein involved in seed germin- 
ation of Arabidopsis [53]. The other isolated module 
represents the association between GBSS and two zinc 
finger family proteins, constans-like {COL: At2g21320) 
and constans-like 7 (C0I7: Atlg73870). In rice, MYC 
and EREBP are known to act synergistically in transcrip- 
tional regulation of the rice Wx gene [23]. Since these 
homologues were not included in the GBSS sub-network 
of Arabidopsis, the results possibly suggest a difference 
in the mechanisms controlling the biosynthesis of storage 
starch (i.e. in rice endosperm) and transitory starch (i.e. 
in Arabidopsis leaves). 
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Sub-networks for transcriptional regulation of starch 
metabolism 

To identify candidate TFs that may play regulatory roles in 
starch metabolism, we focused on the relationships be- 
tween starch metabolic genes and their immediately con- 
nected TFs. The starch sub-network excerpted accordingly 
contained 49 nodes (16 starch, 1 clock, and 32 TP genes) 
and 79 edges (Figure 4A). Details of the genes in the starch 
sub-network are summarized in Additional file 2: Table SI. 
Within the starch sub-network, we observed 5 types of 
gene-to-gene interactions or edges; those were interactions 
between starch metabolic genes (8 edges), between TP 
genes (26 edges), between starch metabolic and TF genes 
(42 edges), between clock and starch metabolic genes 
(2 edges), and between clock and TF genes (1 edge). 

The Arabidopsis gene networks were previously recon- 
structed from global microarray conditions based on GGM 
[29] and co-expression analysis [32]. Interestingly, the sub- 
networks related to starch metabolism were extracted from 
both studies, even though different algorithms were ap- 
plied. The sub-networks derived from GGM [29] and co- 
expression analysis [32] contain a total of 15 and 10 genes, 
and from these gene set 10 and 9 were identified as starch 
metabolic genes, respectively. There are 6 genes in starch 
degradation, i.e. disproportionating enzyme 1 and 2 
(At5g64860 and At2g40840, respectively), glucan water 
dikinase 1 (starch excessl - Atlgl0760), glucan phosphat- 
ase (starch excess 4 - At3g52180), starch phosphorylase 2 
(At3g46970), and a-amylase 3 (Atlg69830), and one gene 
in starch synthesis, branching enzymes 3 (At2g36390) that 
both of these networks have in common. From total 15 
starch metabolic genes identified in the sub-networks of 
previous studies, 7 genes, which are starch synthase 4 
(At4gl8240), branching enzyme 3 (At2g36390), dispropor- 
tionating enzyme 2 (At2g40840), phosphoglucan water 
dikinase (At5g26570), isoamylase 2 (Atlg03310), cytosolic 
and plastidial starch phosphorylase (At3g46970 and 
At3g29320, respectively), were also observed in our starch 
sub-network (Additional file 2: Table SI). These results 
suggest coherent expressions of starch metabolic genes, 
especially those in the starch degradation process. It is 
worth to note that the starch sub-networks derived from 
those studies contain mainly metabolic genes, except 
one clock-regulated gene encoding pseudo-response 
regulator 3 (APRR3: At5g60100) which was identified in 
the starch sub-network derived from the GGM analysis 
[29]. However, this clock gene was not present in our 
starch sub-network. 

Prediction of TF binding sites in promoter sequences of 
target genes 

The 42 interactions between 12 starch and 32 TF genes 
were further analyzed by searching for the presence/ 
absence of TF binding sites in the promoter region of 



starch metabolic genes (Table 3). The analysis con- 
ducted for prediction of physical binding of TF to tar- 
get genes is TF family-based, thus, it stands on the 
assumption that different TFs of the same TF family 
most likely attach to the same TF-binding site on the 
promoter region of a target gene. 

In the TF family-based comparative analysis, all pos- 
sible plant TF binding sites in the 2-kb upstream region 
of the 12 target genes (i.e. starch metabolic genes) were 
first obtained using a web-based tool from AthaMap 
database http://www.athamap.de/ [54-57]. The names of 
known TFs and their relative binding locations pre- 
dicted within the 2-kb upstream region of 12 starch 
metabolic genes are summarized in Additional file 3: 
Table S2. From the list of all possible TF binding sites, 
physical binding was predicted between 10 TFs and 6 
starch metabolic genes, shown as 11 starch-TF interac- 
tions in the starch sub-network (Table 3). For easier 
visualization, we included information on the presence 
of putative TF-binding sites and re-drew the regulatory 
model of genes in the starch sub-network (Figure 4B). 
The results indicate that 10 TFs show only a positive 
correlation with 6 starch metabolic genes and can be 
classified into 7 families. 

Predicted regulatory modules for a model validation 

The robustness of the predicted regulatory network 
model of starch metabolism was further verified using a 
diverse set of "condition-independent" microarray data. 
This analysis has been implemented to find how the ex- 
pression patterns of any 2 genes under various condi- 
tions correlate. The data representing the relationship of 
Arabidopsis co-regulated genes was obtained from the 
ATTED-II database http://atted.jp/ [58,59]. In ATTED- 
II, the pair-wise correlation coefficients of 22,263 Arabi- 
dopsis genes were calculated from 58 experiments of 
GeneChip microarray (1,388 arrays in total) using 
weighted Pearson correlation. The pair-wise correlation 
coefficients of 12 starch metabolic genes with all Arabi- 
dopsis TF genes (1,849 genes listed in this database) were 
ranked from highest to lowest value (Table 3). 

The pair-wise correlation coefficients between given 
target genes and all TFs were observed as normally dis- 
tributed in this condition-independent analysis. The TF 
was thus considered significant when its correlation coef- 
ficient with its target starch metabolic gene was higher 
than the population mean with 97.5% confidence ana- 
lysed by the single-sample ^-test (one-tailed). From all 
the TFs in the starch sub-network, 8 TF genes passed this 
cut-off by being significantly and highly correlated with 
their target gene expression, and they were preliminarily 
chosen as candidates for experimental validation (dis- 
cussed in the following section). However, among these 8 
candidates, the expression patterns of 2 TF genes. 



Ingkasuwan et al. BMC Systems Biology 2012, 6:100 
httpy/www.biomedcentral.com/l 752-0509/6/1 00 



Page 9 of 21 




Ingkasuwan et al. BMC Systems Biology 2012, 6:100 
httpy/www.biomedcentral.com/l 752-0509/6/1 00 



Page 10 of 21 



(See figure on previous page.) 

Figure 4 The starch sub-network of 16 starch genes and their first step neighbours (Q<0.05). (a) Tlie grapliical display of tlie starcli sub- 
networl<. A pinl< rectangle represents a TF gene; an orange diamond represents a clock gene; a green circle represents a starch gene; a black solid 
line represents an association with positive correlation; and a blue broken line represents an association with negative correlation, (b) The 
metabolic display of the starch sub-network. A pink oval represents a TF gene; A red oval represents a TF gene predicted to have physical 
binding sites in target genes according to the comparative TF family analysis; an orange diamond represents a clock gene; a green rectangle 
represents a starch gene; a blue rectangle represents a metabolite; a black line represents an association with positive correlation; a red line 
represents an association with negative correlation; and a blue line represents a reaction in starch metabolism. 

V J 



constans-like 9 (C0I9: At3g07650) and bHLH 
(Atlg05805), did not adhere to the simple assumption that 
a TF should express at the same time, or earlier, than its 
target gene. The diurnal expression patterns of both C0L9 
and bHLH indicated that their induction took place after 
the expression of their potential target starch metabolic 
genes (Additional file 4: Figure S2). Therefore, by excluding 
these 2 TFs, the rest 6 candidate TF genes— A t/DDS 
(At2g02070), C2H2 (At3g50700), COL (At2g21320), C0L7 
(Atlg73870), WLIM2a (At2g39900), and KH-CCCH 
(At5g06770), each predicted as regulators for the expres- 
sion of 554, a-glucosidase-like 4 {AGLU-like 4: At5gll720), 
GBSS (co-regulated by COL and COLT), DPEl, and starch 
synthase 1 {SSI: At5g24300), respectively (Table 4)— were 
selected as final candidates and subsequently used in ex- 
perimental validation. According to the TF family-based 
prediction of binding sequences (Tables 3 and 4) of these 
6 candidates, putative binding sites for zinc finger C2H2 
type TFs were located in the promoter regions of their 
target genes, SS4 and AGLU-like 4, Accordingly, the 
gene-to-gene associations between AtlDDS and SS4, and 
between C2H2 and AGLU-like 4, gain strong support 
from both the TF binding site prediction and the global 
co-expression analysis. 

Expression analysis of target genes predicted in the 
regulatory modules 

To experimentally verify the regulatory role of the 6 can- 
didate TFs in the proposed TF-target model (Figure 4B; 
Table 4), the accumulation of starch metabolic gene 
transcripts was determined using homozygous knockout 
lines. T-DNA or transposon inserted knockout lines — 
AtiddSy c2h2, col, col7, wlim2a, and kh-ccch—were 
obtained from The Arabidopsis Biological Resource 
Centre (ABRC) [60] and The Nottingham Arabidopsis 
Stock Centre (NASC) [61]. The homozygous mutant 
plants were grown under the conditions described in 
[36] (See method). Rosette leaves at 3.90 developmental 
stage [62] were harvested 4 times within a 24- hour 
period, and used as materials for RNA extraction. The 
mRNA levels of TFs and target genes were quantified by 
quantitative real-time reverse transcription polymerase 
chain reaction (qRT-PCR) analysis. 

AtlDDS, C2H2, C0L7, and KH-CCCH mRNAs were 
absent in AtiddS, c2h2, col?, and kh-ccch mutant lines. 



respectively, whereas COL and WLIM2a mRNAs were 
partially detected in col and wlim2a mutant lines, re- 
spectively (data not shown). The mRNA accumulations 
of target starch metabolic genes were then monitored in 
these mutant lines to determine the effect of disruption 
of TFs predicted to act as regulators. The results indi- 
cated that a starch metabolic gene, SS4, showed a 
decreased mRNA level in AtiddS mutant (Figure 5A). 
Significant down-regulation of SS4 was observed at the 
end of the light period of both short and long days 
(Figures 5 A and 5B). It appears that AtlDDS plays an 
important role in the regulation of SS4 gene expression. 
By contrast, the mRNA levels of AGLU-like 4, GBSS, 
DPEl, and SSI genes were not different between the 
wild type and mutant lines — c2h2, col col7, wlim2a, 
and kh-ccch, respectively — during the time course of 
12 L/12D condition (Additional file 5: Figure S3). 

In addition to the effect of direct TF-target relation- 
ships, the regulatory network can be affected by the 
associated modules particularly when the target genes 
are in the same metabolic process. Based on this as- 
sumption, the SS4 mRNA levels were determined in 
other TF mutant lines. Similar to the results obtained 
from the AtiddS mutant, SS4 was down-regulated in the 
col mutant during the light period of both short and 
long day conditions (Figure 5 A and 5B). Alteration of 
SS4 expression may have resulted from (i) the negative 
effect of disruption of COL on AtlDDS gene expres- 
sion or (ii) the direct control of SS4 by COL. Since 
the mRNA levels of AtlDDS in the wild type and the 
col mutant were different only in the long day condi- 
tion (Additional file 6: Figure S4), regulation of AtlDDS 
expression by COL therefore remains inconclusive. Al- 
though the exact underlying mechanism is unknown, 
COL appears to be in part of the starch metabolic gene 
regulatory network, and its relevance is further evidenced 
by starch granule deformation (discussed in the following 
section). 

According to in silico prediction of the Arabidopsis 
proteome, 176 proteins are classified in the C2H2 zinc 
finger family [63]. AtlDDS belongs to this C2H2 gene 
family and is further classified into the same sub-family 
as the maize indeterminate 1 gene (ZmlDl), which is a 
key regulator of flowering transition [63,64]. In Arabi- 
dopsis, there are 16 homologues of AtlDD genes. Among 



Table 3 The TF-starch relationships from the starch sub-network 
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TF Family AthaMap^ Rank of TF-starch correlation in ATTED 
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TFs in a family 
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Table 3 The TF-starch relationships from the starch sub-network (Continued) 



At4g 18240 
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-0.0671 / n 
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N 
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NA 
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NA 
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30 
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Global-expression correlations extracted from ATTED database and the correlation ranks were shown in the table. 

1. pCor is partial correlation of a gene pair under the diurnal condition predicted by GGM. p and n represent a positive and negative interactions, respectively. 

2. Y = Physical binding of TPs to target genes was predicted by the comparative TF family analysis; N = No physical binding of TPs was predicted; NA = no TF binding site information in AthaMap. 
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Table 4 Six candidate TF genes and their ranks based on the global-expression correlations 



Starch metabolic gene 



TF gene 



TF family 



Binding Site ATTED Correlation TF rank Family rank 



At4g 18240 


SS4 


At2g02070 


AtlDDS 


C2H2 


Y 


0.542 


9 


1/122 


At5g 11720 


AGLU- like 4 


At3g50700 


C2H2 


C2H2 


Y 


0.295 


87 


5/122 


At5g64860 


DPE 1 


At2g39900 


WLIM2a 


LIM 


n.a. 


0.715 


1 


1/13 


Atl 932900 


GBSS 


Atlg73870 


C0L7 


C2C2-CO-like 


n.a. 


0.609 


4 


3/42 


Atlg32900 


GBSS 


At2g21320 


COL 


C2C2-CO-like 


n.a. 


0.608 


5 


4/42 


At5g24300 


SSI 


At5g06770 


KH-CCCH 


C3H 


n.a. 


0.422 


31 


3/172 



1. Y = potential binding site of a TF was found at the 2,000-base upstream region of target gene by the comparative TF family analysis; n.a. = information of a 
binding site of that TF is not available in AthaMap http://www.athamap.de/. 

2. Pearson correlation calculated from the 58 GeneChip experiments including 1,388 arrays using the weighted Pearson correlation in ATTED database http:// 
atted.jp/. 

3. Rank of TF based on the ATTED correlation of 1,849 TF genes in the database. 

4. Rank of TF based on the ATTED correlation of TF genes in each specific TF family. The numerator indicates the rank and the denominator indicates the number 
of genes in a specific TF family. 



them, the biological functions of magpie {MGP/ AtlDDS; 
Atlg03840), nutcracker {NUCI AtlDDS) At5g44160), and 
jackdaw (JKD/AtlDDlO; At5g03150) genes have been 
characterized [65,66]. To date, there are only three 
AtlDD genes— NUC/AtIDD8, AtlDDM (Atlg68130), 
and shoot graviropism 5 {SGRS/AtlDDlS; At2g01940)— 
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Figure 5 SS4 gene expression in the wild type, AtiddS and col 
mutants. The relative transcript level of SS4 mRNA was measured in 
Arobidopsis grown under (a) short (12 L/12D) and (b) long day 
(16 L/8D) conditions. Ubiquitin 2 (UBQ2) was used as a constitutive 
control for normalization. 



that have been reported to play roles, though indirectly, 
in sugar and starch metabolism [67-69]. Based on 
phylogenetic analysis, the AtlDD genes most closely 
related to AtlDDS are AtIDD4 (At2g02080) and AtIDD6 
(Atlgl4580). AtIDD4 is reported as a TF whose expres- 
sion is affected by defects in chloroplast import ma- 
chinery, and it is postulated to function as a 
transcriptional activator of nuclear-encoded photosyn- 
thetic gene expression [70]. In addition, AtIDD4 and 
AtIDD6 are identified as gibberellin-regulated genes 
[71]. Based on the evidence known to date, both 
AtIDD4 and AtIDD6 do not seem to have any function 
related to sugar and starch metabolism. 

The database of global gene expression analysis pro- 
vides evidence showing that AtlDDS is abundantly 
expressed in leaf tissues. The global view of AtlDDS 
gene expression was roughly examined using the Gene- 
vestigator web-based software [72]. The expression of 
AtlDDS was observed ubiquitously in all stages, but its 
level was particularly high from stages of developed ros- 
ette leaves to developed flowers. In flowers, expression 
pattern of AtlDDS was classified as stamen-specific lack 




AtiddS 



kh-ccch 



WT (Col-0) 

Figure 6 Total starch content of leaves in the wild type and 

AtlddS, col, col7, and kh-ccch mutants. Arobidopsis plants were 
grown under short (12 L/12D; grey bar) and long day (16 L/8D; 
black bar) conditions, respectively. 
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Figure 7 Transmission electron micrographs of the wild type, 

AtiddS, and col chloroplast. Length of a scale bar indicated in the 
micrograph is 2 pm. 

v J 



of expression', suggesting that its expression disappears, 
especially in anthers of flowers from stage 7 to 11 
[73,74]. 

Information on AtIDD5-interacting proteins further 
suggests that AtlDDS is associated with other signal- 
ling components, such as radical-induced cell death 1 
(RCDl: Atlg32230) [75]. RCDl is not only known as 
clone eighty-one (CEOl), which recovers the oxidative 
stress-sensitivity phenotype of the Yapl' mutant yeast 



[76], but also a major regulator of hormonal signalling 
and stress-response processes [77,78]. According to the 
AGRIS database http://arabidopsis.med.ohio-state.edu/ 
REIN/, AtlDDS is predicted to interact with a MADS- 
box domain TF, sepallata 3 (SEP3), which is a global 
moderator of multifunctional protein-complexes con- 
trolling flowering and hormonal signaling processes, 
especially responses to auxin stimuli [79]. 

Total starch content and granule morphology analysis 

Total starch content was measured under both short 
and long day conditions. Starch was extracted from fully 
expanded leaves of all the mutants and the wild type, 
and analyzed in the form of glucose by capillary 
electrophoresis-diode array detector (CE-DAD) after en- 
zymatic digestion (see method). Both the mutants and 
the wild type accumulated starch at relatively similar 
levels (Figure 6), even though SS4 was down- regulated 
in AtiddS and col mutants (Figure 5). It has been 
reported that the size of starch granules can be signifi- 
cantly altered in ss4 mutant while only 35% reduction of 
the starch content could be observed [6]. Referring to 
these previous findings, we speculated that changes in 
starch granule morphology and number may occur in 
AtiddS and col mutants as they show reduced levels of 
SS4 mRNA accumulation during the light period 
(Figure 5). 

Chloroplast and starch granule morphology of AtiddS 
and col mutant lines were examined by transmitted elec- 
tron microscopy (TEM). Transmission electron micro- 
graphs of AtiddS and col mutant lines and wild type are 
shown in Figure 7. They were analyzed by the image 
processing software Image J (version 1.45) to obtain a 
group of data sets including (i) size measured by Area 
and (ii) shape measured by 'Width', 'Height', and 'Circu- 
larity' of the chloroplast and starch granule cross-sec- 
tions. Although these parameters might not represent 
the actual size and shape of chloroplasts and starch 
granules, we considered them suitable for a comparative 
purpose. Since the data was not normally distributed, a 
non-parametric statistic, the Mann- Whitney U test, was 
applied for testing significant differences between the 
wild type and the mutants. The relative mean ranks and 
P-values from the Mann-Whitney U test are described 
in Table 5. Descriptive statistics of chloroplast and starch 
granule morphology of AtiddS and col mutants are sum- 
marized in Additional file 7: Table S3. 

Area, width, height, and circularity of 259, 176, and 
368 chloroplasts of AtiddS^ col, and the wild type, re- 
spectively, were measured using the ImageJ software. 
According to the Mann-Whitney U analysis, chloroplast 
area of the wild type was significantly larger than that of 
AtiddS and col mutants (P-value = 3.38E-26 and 2.99E- 
10, respectively) (Table 5) with the means of chloroplast 
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Table 5 The statistics from Mann- Whitney U test of chloroplast and starch morphology, and granule number 





Arabidopsis Line 


N' 


Area 


Width (urn) 


Height (nm) 


Circularity 


CHLOROPLAST 






Relative Mean Rank^ 










AtiddS 


259 


35.52 


41.96 


45.06 


42.75 




WT (Col-0) 


368 


60.33 


55.80 


53.61 


55.24 




col 


176 


38.81 


53.70 


33.94 


48.20 




WT (Col-0) 




55.49 


48.37 


57.82 


50.99 








P-value (Mann-Whitney U)^ 








AtiddS 




3.38E-26 


3.52E-09 


2.60E-04 


9.86E-08 




col 




2.99E-10 


0.0442 


1.90E-19 


0.2919 


STARCH GRANULE 






Relative Mean Rank 










AtiddS 


890 


45.63 


47.58 


47.35 


47.66 




WT (Col-0) 


895 


54.40 


52.47 


52.69 


52.38 




col 


505 


48.82 


54.72 


38.91 


60.53 




WT (Col-0) 




50.72 


47.40 


56.31 


44.11 








P-value (Mann-Whitney U) 








AtiddS 




1.38E-10 


3.45E-04 


9.54E-05 


5.54E-04 




col 




0.2352 


5.23E-06 


2.49E-27 


1.64E-24 


GRANULE NUMBER 






Number of starch granule / chloroplast 










Relative Mean Rank 










AtiddS 


283 


53.57 










WT (Col-0) 


408 


47.65 










col 


177 


54.18 










WT (Col-0) 




48.31 














P-value (Mann-Whitney U) 








AtiddS 




0.0067 










col 




0.0213 









1. N is a number of observed chloroplast (or starch granule) from Transmission electron micrographs. For the granule number analysis, N is a number of observed 
chloroplast. 

2. Relative Mean Rank was calculated from a ratio of mean rank to total rank and multiplied by 100 when the data was ranked in an ascending order. 

3. The P-value calculated from the Mann-Whitney Latest less than 0.05 was shown in bold letters indicating a significant difference of morphological parameters 
between the mutant and the wild type. 



areas at 13.18, 9.26, and 10.50 [im^ for the wild type, 
AtiddS, and col, respectively. In addition to the size, the 
shape of chloroplasts of both mutants also differed from 
the wild type. The width, height, and circularity of the 
chloroplasts were significantly smaller in AtiddS than in 
the wild type. The small circularity values of AtiddS 
chloroplasts indicate that they are in more oblong 
shapes relative to the wild type chloroplasts. In addition, 
the chloroplasts in the col mutant had longer width but 
less height than those in the wild type. The results, 
therefore, suggest that both mutants develop chloro- 
plasts with altered morphology, which, particularly, ap- 
pear smaller or thinner than the chloroplasts in the wild 
type (Figure 7 and Table 5). 

Since chloroplasts of both mutants were altered with 
respect to their size and shape, we examined their effects 
on the morphology of accumulated starch granules. 
Reduction of starch granule size, inferred from the cross- 



section area, was significant in AtiddS (P-value = 1.38E- 
10), but not in the col mutant. The means of starch 
granule areas of the wild type, AtiddS, and col were 0.54, 
0.42, and 0.50 (im^, respectively. In contrast, the granule 
shape deformity was noticed in both col and AtiddS 
mutants (Figure 7 and Table 5). The decrease in width, 
height, and circularity of AtiddS starch granule most 
likely suggested that the granule was small and in oblong 
shapes. As compared to the wild type, the col starch 
granules were observed to have greater circularity, 
suggesting that they were relatively round in shape. 

According to the work of Rolden and coworkers [6], a 
chloroplast of the ss4 mutant mostly contains one large 
starch granule. When examined under TEM— among 
283, 177, and 408 chloroplasts of AtiddS, col, and the 
wild type, respectively — none of the chloroplasts were 
observed to contain a single large starch granule like the 
ss4 mutant. On the other hand, the majority of 
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Figure 8 Distribution of granule number per chloroplast in the 
wild type, AtiddS, and col mutants. Means and P-value calculated 
from the Mann-Whitney U test are indicated. 



col 



Mean = 3.48 
P-value = 0.0213 



chloroplasts— corresponding to 25.4%, 27.1% and 27.5% 
of observed chloroplasts in AtiddS, col, and the wild 
type, respectively— normally contained 3 starch granules. 
We further investigated the distribution of starch granule 
number per chloroplast to find the difference between 
the mutants and wild type (Figure 8). In the AtiddS mu- 
tant, 86.2% of the observed chloroplasts contained 2-5 
starch granules, whereas 85.8% of chloroplasts from the 
wild type contained 1-4 starch granules. Interestingly, 
we observed that the number of chloroplasts containing 
2 and 4 starch granules in the col mutant was lower than 
those in the AtiddS and the wild type, whereas the num- 
ber of chloroplasts containing more than 5 granules was 



CD 



CO ^ 
Q. O 

o ^ 



O 



WT (Col-0) 



a 



123456789 10 



Atidd5 



1 23456789 10 



col 



123456789 10 

Number of granule / chloroplast 

Figure 9 Relationship between granule number and chloroplast 
size in the wild type, AtiddS, and col mutants. 



higher than the other lines (Figure 8). Moreover, the col 
mutant was the only line that was observed to contain 
up to 10 starch granules per chloroplast. The relative 
mean rank and P-value from the Mann- Whitney U test 
of the mutants and the wild type shown in Table 5 indi- 
cated that both AtiddS and col mutants had significantly 
higher numbers of starch granules per chloroplast than 
the wild type (P-value = 0.0067 and 0.0213, respectively). 
The results suggest that reduction of SS4 expression in 
the AtiddS and col mutant lines leads to a significant 
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increase in starch granule numbers, while their distribu- 
tions of granule number per chloroplast are differently 
affected among the mutants (Figure 8). 

The relationship between the size of chloroplast and 
the number of accumulated starch granules is shown in 
Figure 9. Our results indicate that the number of starch 
granules increased according to chloroplast size. Larger 
chloroplasts tended to contain greater numbers of starch 
granules; however the pattern of correlation was not 
uniform among the wild type and mutants. In the 
AtiddS mutant, a positive correlation between chloro- 
plast size and the number of starch granules was only 
observed in the chloroplast containing 1-4 granules. It 
appears that the chloroplasts in this mutant are unable 
to expand after reaching critical size; however, they con- 
tinued to store higher numbers of starch granules with- 
out increasing their size. In col mutant, the positive 
correlation was observed in the chloroplast containing 
1-6 granules, whereas the size of the chloroplast con- 
taining 6-8 starch granules tended to decrease in ac- 
cordance with an increase in the number of starch 
granules. The average sizes of chloroplasts containing 7 
and 8 granules were the same as those having 2 starch 
granules. In addition, the size of the col chloroplasts 
containing 10 starch granules was similar to the size of 
chloroplasts containing 6 granules, suggesting this might 
be the critical size limit of the col chloroplast. 

The results indicated that, in addition to having 
defects in their control of SS4 gene expression, both 
AtiddS and col mutants are unable to increase the size 
of chloroplasts, although they may still retain the cap- 
ability to expand their chloroplast to contain relatively 
small numbers of starch granules until the chloroplast 
reaches its critical size limit. Particularly in AtiddS ^ hav- 
ing relatively small starch granules can be another adap- 
tive response caused by chloroplast deformity. The 
observed phenomena may suggest the alternative roles 
of AtlDDS and COL in controlling chloroplast size limit, 
which may synchronize with transcriptional regulation 
of a starch biosynthetic enzyme, SS4. Our findings 
address a question of how starch biosynthesis and 
chloroplast development and/or functions are synergis- 
tically controlled in plant cells. The underlying mechan- 
ism of interaction awaits further investigation. 

Conclusions 

In this study, we proposed a transcriptional regulatory 
network of starch metabolism in Arabidopsis leaves, and 
examined the biological relevance of predicted network 
modules. The general workflow of data acquisition, refine- 
ment, and experimental validation provides a model case 
for reconstruction of transcriptional regulatory network. 
The present work widely utilizes publicly available bio- 
logical information and resource databases, demonstrating 



how they can be integrated to find biological significance 
of predicted network modules. Construction of gene-to- 
gene association network models is based on diurnal regu- 
lation of starch metabolism in leaves where the transcrip- 
tomes oscillate during the day/night cycles. We first 
grouped time-series-dependent significant genes on tran- 
scriptome into four classes showing distinct patterns of 
co-regulation with starch biosynthesis or degradation. A 
particular focus has been placed on relationships be- 
tween TFs, clock genes and starch metabolic genes, to 
obtain transcriptional regulatory network model of starch 
metabolism. The network constructed by the small sam- 
ple inference of GGM suggests relationships between 
TFs and target starch metabolic genes. Gene-to-gene 
associations have been further refined by prediction of 
TF binding sites in target genes and by global co- 
expression analysis. Through these approaches, we finally 
showed the involvement of AtlDDS and COL in tran- 
scriptional regulation of SS4, These regulatory networks 
were considered attributable to daytime starch biosyn- 
thesis by SS4. In addition, AtlDDS and COL were shown 
to control chloroplast development and starch granule 
formation. The present work on TF network modelling 
and examination provides new insights into the regula- 
tory mechanisms of starch biosynthesis and granule for- 
mation in the chloroplast. 

Methods 

Microarray data pre-processing 

This study utilized Arabidopsis Aff)^metrix microarray 
data (CEL files) downloaded from the Nottingham Arabi- 
dopsis Stock Centre's microarray database (NASCArrays) 
[Experiment Reference Number: NASCARRAYS -60] 
http://affymetrix.arabidopsis.info/. This microarray expe- 
riment contains a set of 22 k Arabidopsis ATH-1 genome 
array transcriptome data of leaves at developmental stage 
3.9 taken after 1, 2, 4, 8, and 12 hours in both darkness and 
light [36]. A qspUne' normalization [80] and model-based 
expression index [81] were carried out in the microarray 
pre-processing, which was done using the Affy package in 
Bioconductor http://www.bioconductor.org. 

Significant analysis of time-series data 

The significant test for the time-series data was per- 
formed using the EDGE program (version 1.1. 17S) [39] 
http://faculty.washington.edu/jstorey/edge/. Hypothesis 
testing on time-series expression of each gene was per- 
formed to test whether an average expression constitutes 
a flat line. The gene expression profile was fitted under a 
model based on null and alternative hypotheses. The 
null hypothesis states that there is no differential gene 
expression over a time period. The alternative hypothesis 
states that a gene is differentially expressed over a time 
period. The goodness of fit of 2 models was compared 
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by F-statistic using a significant cut-off based on a false 
discovery rate criterion [82,83]. 

The small-sample Inference by graphical Gaussian model 
(GGM) 

The R package "GeneNet" [33]— available at the R arch- 
ive (CRAN) http://CRAN.R-projectorg/— was used to 
construct the gene association network. The GeneNet 
was developed from the small-sample inference frame- 
work of graphical Gaussian model (GGM) to obtain a 
partial correlation coefficient, which is a correlation be- 
tween 2 variables obtained when eliminating effects of 
other variables. In the case of 3 variables— x, y, and z— 
the partial correlation of x and y when eliminating the 
effect of z, pr-^y^z can be calculated as follows 



P^xy,z 




where r is the correlation coefficient between 2 variables. 
In the case of more than 3 variables, the partial correl- 
ation can be calculated from the following equation. 



The pr^yg is the partial correlation between x and y 
against variable 3 to g. The s^y = the xy^^ element of the 
inverse of variance matrix (S = V'^). The element in 
matrix V is Vij (i, j = 1, ... , n) corresponding to a covari- 
ance between variables i and j. 

For microarray, the number of variables (i.e. genes) is 
much higher than the number of measurements (i.e. 
microarray conditions), thus making the inversion step of 
matrix V invalid. In the new framework of GGM, the par- 
ameter estimation techniques were used to obtain partial 
correlation of small sample size. In order to decide which 
edges are significant to be included in the resulting GGM 
network, statistical significance was further assigned to 
the edges in the GGM network by fitting a mixture 
model (as shown below) to the observed partial correl- 
ation coefficients [33]. 

The distribution of observed partial correlation coeffi- 
cients /(r) is 

/('-') = '/(/o(r;/c) + (l-%)/^(r-) 

where r is the observed partial correlation, rjQ is the un- 
known proportion of null edges, fo is the distribution 
under the null hypothesis of zero-partial correlation, k is 
the degree of freedom, and /a is the distribution of 
observed partial correlations assigned to actually existing 
edges. The two-sided /7-values for each edge correspond- 
ing to the null distribution fo were subsequently calcu- 
lated and followed by false discovery rate multiple testing 



[82,83] to obtain ^-values. The edges with ^-values are 
equal or lower than 0.05 were presented in the resulting 
GGM network in this study. 

Arabidopsis lines and growth conditions 

Arabidopsis ecotype Columbia-0 was used in this 
study. Arabidopsis mutant lines were obtained from 
the T-DNA or transposon inserted mutant collection 
(Col-0 background) of The Arabidopsis Biological Re- 
source Centre (ABRC) [60] and The Nottingham Ara- 
bidopsis Stock Centre (NASC) [61]. Accession 
numbers of AtiddS, c2h2, col, col7, wlim2a, and kh- 
ccch are SALK_1 10990, SALK_070916, SALK_061956, 
SM_3_37788, SALK_067756, and SAIL_672_A10, re- 
spectively. Details of all mutant lines are shown in Add- 
itional file 8: Table S4. The seeds were vernalized in the 
dark for 3 days at 4°C before germination. Plants were 
grown on an equal mixture of sterile vermiculite and 
peat-based growing medium (PRO-MIX Bx/Microrise 
Pro, Premier) in a growth cabinet (SANYO) set at 60% 
humidity and 20-22°C with a light intensity of 
100 (imol m'^ sec'^ and under 12 hr light/ 12 hr dark 
(short day) or 16 hr light/8 hr dark (long day) cycles. 
The trays of plant pots were sub-irrigated with a half- 
strength Arabidopsis liquid nutrient culture [84]. Leaves 
at a developmental stage of 3.90 [62] were harvested 4 
times a day — 1 hr before and after day break and night 
break. Leaves for starch analysis were harvested at the 
end of the light period. 

Expression analysis by quantitative RT-PCR 

Total RNA was extracted from 100-200 mg leaf material 
(3 biological replicates) using Plant RNeasy kit {Qiagen), 
treated with DNasel (Invitrogen), and reverse transcribed 
by Omniscript Reverse Transcriptase {Qiagen), Subse- 
quently, real-time PCR was carried out using SYBR® 
Premix Ex Taq ™ II (Perfect Real Time) (Takara) using 
ubiquitin 2 (UBQ2) as a constitutive internal control. 
Details of the primer pairs used in qRT-PCR experi- 
ments are shown in Additional file 9: Table S5. 

Starch extraction and measurement 

Starch from Arabidopsis leaves (3 biological replicates) 
was extracted using the method described by Smith and 
Zeeman [85]. Gelatinized starch was hydrolyzed to glu- 
cose by incubation for 4 hr at 37°C with a-amylase and 
a-amyloglucosidase. After the enzymatic digestion of 
starch to glucose, the amount of glucose was quantified 
by the capillary electrophoresis photodiode array detec- 
tion (CE-DAD) system according to the manufacturers 
protocol (Agilent) [86]. Leaf starch content was calcu- 
lated from the amount of glucose measured in this 
enzymatically-digested extract. 
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Starch granule morphology analysis by Transmission 
Electron Microscope (TEM) 

Fully expanded Arabidopsis leaves were collected at end 
of day, cut into 2x2 mm^ pieces, and immediately fixed 
with a cold solution of glutaraldehyde. Various para- 
meters describing starch granule morphology (i.e. area, 
perimeter, width, height, and circularity) and number of 
starch granules per chloroplast were measured from 
TEM micrographs using Image J software (version 1.45). 
It was noted that circularity is calculated by the following 
formula: 



Circularity - 



477- X area 
perimeter^ 



A value approaches 1.0 meaning a perfect circle and 
0.0 meaning an elongated shape. The morphology data 
was tested for a statistically difference using a non- 
parametric Mann- Whitney U statistic (P-value < 0.05). 
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